The useR! Conference was held in Toulouse, France and for me this was my second useR! after my first in Brisbane last year. This time around I wanted to write about my experiences and some highlights similar to my post on the RStudio::Conference 2019 & Tidyverse Dev Day earlier this year. This blog post will be divided into 4 sections: Programming, Shiny, {Packages}, and Touring Toulouse.
You can find slides and videos (in a week or so) in:
useR Materials Github Repo (also contains workshop stuff) courtesy of Suthira Owlarn
Timestamps for the Keynote Presentations courtesy of David Smith
As usual there were many talks that I didn’t get to go to as there are around 3~5 tracks across different rooms featuring talks on a certain aspect of R such as Shiny, Modelling, Data handling, DevOps, Education, etc. It also goes without saying that the talks highlighted below are mainly based on my interests so they may not be all that interesting to you specifically. In the coming weeks I’ll also add video links to the presentations below when they become available from R Consortium’s Youtube channel.
Let’s begin!
Acknowledging the difficulty of spread() and gather() you might have heard of the creation of the pivot_wider() and pivot_longer() functions in recent months. You really should take a look at the work-in-progress Vignette for a comprehensive understanding of the new functions but the talk featured some live-coding by Hadley (Code) and some cool spread/gather animations via Charco Hui’s masters’ thesis.
For more material you might be interested in Hiroaki Yutani’s tidyr 1.0.0 presentation from June’s Tokyo.R meetup. It’s mainly in Japanese but there are lots of code and explanatory graphics that may aid you in visualizing how the new functions work. You can also read a short English summary of the talk here.
Taking the tidy data principles into account but for grouped data, Romain Francois talked about the new group_*() functions in the {dplyr} package.
While in previous versions of {dplyr} working in a tidy manner with groups was done with group_by() then dplyr::do(), the latter function has been deprecated and have been largely replaced by the {purrr} family of functions instead. In this context the group_map(), group_modify(), and group_walk() functions iterate like the {purrr} functions but instead over groups. You can apply the functions you want to apply to each group inline via a lambda, ~ (as below), or you can specify a function directly without the lambda.
The group_split() operates similarly to base::split() but splits by groups, the output being a list of sliced groups. The group_keys() function returns you the exact grouping structure of the data you used group_by() on, allowing you to check that the structure is right before you start applying functions on your data. group_data() and group_rows() gives you different kind of information about your grouped data as can be seen below.
To shorten the group_by() %>% summarize() workflow you could instead use the summarize_at() function. You can select specific columns with vars(), then actions via a lambda, ~, and you can specify multiple functions with list().
Romain also talked about the {dance} package which is mainly used to experiment and test out possible new {dplyr} functions by leveraging the relatively new {vctrs} and {rlang} packages’ features. The package has a theme of using famous dance moves as the function names!
Lionel Henry talked about programming using {tidyverse} functions. As an introduction he went over data masking in {dplyr} and how it is optimized for interactive coding and single-use %>%s. The usage of non-standard evaluation (NSE) makes analyses easy as you can focus on the data rather than the data structure. However, we hit a stumbling block when it comes to when we want to create custom functions to program with {dplyr}. This is the difference between computing in the work space (as needed) versus computing in a data mask.
This is where tidyeval comes into play via {rlang} for flexible and robust programming in the tidyverse. However {rlang} confused a lot of people due to the strange new syntax it introduced such as the !!, !!!, and enquo(). Also, it introduced new concepts such as quasi-quotation and quosures that made it hard to learn for people especially with those without a programming background. Acknowledging this obstacle, {{ arg }} was introduced to make creating tidyeval functions easier. The new {{ }} (read as “curly-curly”) operator was inspired by the {glue} package and is a short cut for !!enquo(var).
Compared to a R script or R Markdown document, reproducibility suffers in Shiny apps as the outputs are transient and not archivable. RStudio’s Joe Cheng talked about how reproducible analysis with Shiny is inconvenient as reenacting the user’s interaction steps is necessary. A case for having a simple CLICK button to view/download a reproducible artifact can be seen in various industries such as:
ggplots, regex, and SQL queries then insert the code into source/console editorThe different possible outputs we might want from a Shiny app are:
From there Joe talks about how there are a number of options available such as :
Copy-paste: Have a Shiny app **and** RMD report Lexical analysis: automatically generate scripts from app source
code (static analysis and heuristics) Programmatic: Meta-programming techniques to write code for **dual**
purposes (execute interactive **and** export static)In light of the various pros and cons of the above options Joe with the help of Carson Sievert created the…
There are four main steps to follow when using {shinymeta}:
Identify the domain logic inside the code and separate it from
Shiny’s reactive structurewithMetaMode() or expandChain()metaReactive() to create a reactive() that returns a code expressionmetaObserve(), metaRender(), etc.metaExpr() inside function Within the domain logic you identified, identify references to
reactive values and expressions that need to be replaced with static
values and static code!! At run time, choose **which** pieces of domain logic to expose to
the userexpandChain(): turns !! code into variable and introduces code snippet above the functionoutputCodeButton() to add a button for a specific outputdisplayCodeModal() to display underlying codedownloadButton() to allow people to click and download a R script or RMD reportbuildScriptBundle or buidlRmdBundle() to generate .zip bundles dynamicallySome of the limitations and future directions Joe, Carson, and the rest of the Shiny team acknowledge are that:
There’s a lot to take in (this was probably the toughest talk for me to explain in this post…), so besides watching the keynote talk yourself you can also take a look at the shinymeta package website.
Vincent Guyader, from another French R organization ThinkR, talked about the new {golem} package which creates a nice framework for building robust production-ready Shiny apps.
One of the key principles in R is when you are repeatedly writing or using the same code or functions then you should write a package, and this is no different for Shiny apps as well. The reasons Vincent stated were:
With the package infrastructure, you need to have the ui.R and server.R (app_ui.R and app_server.R respectively in {golem}) in the R directory and all you need to run your app is the run_app() function.
{golem} also has functions that make it easy to deploy your app via R Studio Connect, shinyproxy, Shiny server, heroku, etc.
For styling your app with customized JavaScript and CSS files you can easily add them to your Shiny app package directory via the add_js_file() and add_css_file() functions. You can do similar but with modules with add_module(). As {golem} is a package you have all the great attributes of an R package available to you such as unit testing, documentation, and continuous integration/deployment!
Victor Perrier and Fanny Meyer from dreamRs talked about the various Shiny packages that can extend the functionality of your Shiny applications!
The first and probably the most well-known of this group is the {shinyWidgets} package which gives you a variety of cool custom widgets that you can add to make your Shiny app via JavaScript and CSS.
Next, wondering about how exactly users interacted with their Shiny apps and whether they used the included widgets the dreamRs team created the {shinylogs} package. This packages records any and all inputs that are changed as well as the outputs and errors. This is done by storing the JavaScript objects via the localForage JavaScript library. With this in place shiny developers can see the number of connections per day, the user agent family, most viewed tabs, etc.
The {shinybusy} package gives a user feedback when a server operation running or busy such as a spinning circle, a moving bar, or even any kind of gif you choose!
Last but not least is the {shinymanager} package which allows you to administrate and manage who can access your application and protects the source code of your app until authentication is successful!
The dreamRs organization are also the organization that created the {esquisse} package that lets you interactively make ggplot2 graphs with an RStudio addin!
Talking about packages leads me to the next section…
I’ve been curious about data.table so I decided to go to this talk to learn more from Arun Srinivasan, one of the authors of the package. Starting off with some trivia, I finally learned that the reason for the seal on the hex sticker is because seals make an “aR! aR! aR!” sound according to {data.table} creator Matt Dowle, which I thought was pretty great!
Compared to a year ago there has been a lot of change and progress in {data.table}:
A key principle of {data.table} is that there are no dependencies or imports in the package!
The general form of using {data.table} is as follows:
Arun also showed us some examples:
At the end he also talked about the new optimization and functionalities in the package.
froll(), coalesce(), and nafill()At the end of the talk Arun thanked the 69 people (among them Michael Chirico, Philippe Chataignon, Jan Gorecki, etc.) who have contributed a lot to what {data.table} is today!
The {polite} package is one I’ve been used for over a year now (you might’ve seen me use it in my soccer or TV data viz) and I was delighted to hear that the creator was giving a LT on it! Dmytro began with a few do’s and don’ts concerning user-agents and being explicit about them:
Secondly, you should always check the robots.txt for the website which is a file that stipulates various conditions for scraping activity. This can be done via @hrbrmstr’s {robotstxt} package or by checking the output from polite::bow("theWebsiteYouAreScraping.com")(polite::bow() function is what establishes the {polite} session)!
After getting permission you also need to limit the rate at which you scrape, you don’t want to overload the servers of the website you are using, so no parallelization! This can be done with the {ratelimitr} package, purrr::slowly() while the {polite} package automatically delays by 5 seconds when you run polite::scrape().
After scraping, you should definitely cache your responses with {memoise}, which is what is used inside the polite::scrape() function. Also, wrap your scraper function with something like purrr:::safely() so it returns a list of two components, a “result” for successes and “error” object for errors in your scraping.
You can also read his blog post on the talk here which explains a bit more about the polite::use_manners() function that allows you to include {polite} scrapers into your own R packages.
Hannah Frick from Mango Solutions talked about {goodpractice}, a package that gives you advice about good practices for building an R package. By running goodpractice::gp() it does static code analysis and can run around ~200 of the checks available.
A cool thing you can do is that you can customize the different checks it runs, set your own standards beforehand and run the checks based on those standards with the make_check() and make_prep() functions. It’s a great package that I’ve used before at work and for my own packages so definitely try it out!
Riva Quiroga talked about translating the “R for Data Science” book and R data sets into Spanish. This came about as a fact that learning R (or any programming language) can be tough for a non-English speaker as it means you have to not only learn the programming but figuring out what the documentation and use cases in English even mean. To address this language gap the R4DS Spanish translation community project was born, Ciencia de Datos on Github! Through Github and slack the organization sought to translate both the book and the various data sets available in base R, for example: turning “diamonds” into “dimantes”. However, they found that simply trying to rename() everything was not sustainable so they had to find an alternative. This alternative ended up being the {datalang} package.
This package (created by RStudio’s Edgar Ruiz) uses a YAML spec file translating to the language you want for the variable names, value names, help files, etc. After creating the spec file you just have to add it as an argument into the datalang::translate_data()/translate_folder() function and you’ll have a translated data set! The R para Ciencia de Datos Twitter also hosts a Spanish version of #TidyTuesday called #DatosDeMiercoles so check it out!
Another thought I had after this presentation was that maybe this might be a good idea for Japanese?
RStudio’s Joe Rickert talked about R Consortium’s Workings Groups which is an initiative to foster innovation among individuals and companies. Any individual or a group can apply to create a working group to explore what R and other technologies can do in a certain field of interest. Throughout the talk Joe gave examples of successful workings groups such as:
As advice for potential working groups Joe said that one should pick a project with a very wide scope which can benefit from collaboration between members and which can benefit a large portion of the R community.
In the last keynote of the conference Julien Cornebise talked about using technology tools for good using lots of examples throughout his life for both good and bad projects.
Here are some quotes I was able to jot down:
On using technology for good:
“Technology is not a solution it is an accelerator, essentially you just have a better optimizer, you’re just fitting better to the incentives we have around us a society.”
On the motivation of getting involved in #DataForGood projects:
“Are you here to solve the problem or are you here for a really cool application of your fantastic new theory and algorithm?”
On “hackathon syndrome” of many solutions to #DataForGood problems:
“Github is a big cemetary of really good ideas … where do we find software engineers, where do we find the designers, how do we go from the solution to the project to a real product that can be used by many many people?”
Some of the projects he talked about were:
This is definitely a talk I would recommend everybody to watch and you can do so from here!
As I was only heading home on the following Monday, I had the entire weekend to explore Toulouse! I was staying near the Capitole and as Toulouse is pretty walkable I didn’t have to use public transportation at all during my stay. I think I just about walked every street in the city center! Unfortunately, the Musee de Augustins was closed but I was able to visit most of the other sites! Below are some pictures:
Sunday was also Bastille Day so there were some fireworks on display as well. All in all I had a great time in Toulouse!
This was my second useR! Conference and I enjoyed it quite a lot, not to mention I got to do some sightseeing which I wasn’t able to do much of in Brisbane last year. I met a lot of people that I follow on Twitter and I’ve had people come up to me who recognized me from all the data viz/blog posts I do (a first for me) which was really cool (and it helps as I’m very nervous about approaching people especially since they are usually surrounded by other people and I don’t want to interrupt their conversation and… “Oh no it’s time for the next session!”, etc.)!
During a post-conference dinner I had with a dozen or so random R users that were still in Toulouse we all talked about how important the community is. With how open everything is in regards to the talks being recorded and the materials being put online you don’t necessarily have to come all the way to the conference to be able to learn the material. However, the important component of these conferences is being able to talk to the people and engaging with the community which is something I’ve really felt to be a part of since I started R and going to conferences in the past 2 years or so. I think nearly each one of the people I sat with on the table at dinner that night came from a different country and worked in completely different areas which made for some real eye-opening discussion about how R is used worldwide and across industries. I also learned about cultural differences in tech, especially women in tech in Nigeria from Alimi Eyitayo (who also gave a talk on Scaling useR Communities with Engagement and Retention Models at the conference).
There were still a ton of great talks I missed so I’m excited to watch the rest on Youtube. I think I will be at RStudio::Conference next year in San Francisco so hopefully I’ll see some of you there!